Skip to content

Conversation

@Thireus
Copy link
Contributor

@Thireus Thireus commented Oct 30, 2025

  • convert_hf_to_gguf.py - Not touched, use llama.cpp to convert model instead
  • sycl and metal support for imrope not added - See ggml-metal and ggml-sycl folders of https://github.com/ggml-org/llama.cpp/pull/16780/files
  • Vulkan support for imrope not tested
  • Code not tested against any GGUF (looking for testers...)

Source: ggml-org/llama.cpp#16780

- convert_hf_to_gguf.py - Not touched, use llama.cpp to convert model instead
- sysl and metal support for imrope not added
- Vulkan support for imrope not tested
- Code not tested
@ranilongxi
Copy link

I tried to build this branch with cpu-only and CUDA but it kept failing:

CUDA
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_LTO=ON -DGGML_NATIVE=ON -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_CUDA_DMMV_F16=ON -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DLLAMA_CURL=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1

FAILED: src/CMakeFiles/llama.dir/llama-load-tensors.cpp.o .../llama-load-tensors.cpp:1006:13: error: conflicting declaration 'int64_t n_embd' int64_t n_embd = hparams.n_embd / (hparams.n_deepstack_layers + 1); ^~~~~~ .../llama-load-tensors.cpp:227:40: note: previous declaration as 'const int64_t n_embd' [[maybe_unused]] const int64_t n_embd = hparams.n_embd; \ ... in expansion of macro 'LOADING_PRELUDE' .../llama-load-tensors.cpp:1045:13: error: conflicting declaration 'int64_t n_embd'
CPU only
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release ninja -C build -j22

FAILED: src/CMakeFiles/llama.dir/llama-load-tensors.cpp.o .../llama-load-tensors.cpp:1006:13: error: conflicting declaration 'int64_t n_embd' .../llama-load-tensors.cpp:227:40: note: previous declaration as 'const int64_t n_embd' ... in expansion of macro 'LOADING_PRELUDE'

@Thireus
Copy link
Contributor Author

Thireus commented Oct 31, 2025

@ranilongxi, thank you; should be fixed now.

Copy link
Owner

@ikawrakow ikawrakow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comments, LGTM.

But we need to get this tested by someone before merging.

@Thireus
Copy link
Contributor Author

Thireus commented Oct 31, 2025

Thank you @ikawrakow, I'll do my best to resolve your comments. There are still some compilation issues which I'll resolve as well.

@Thireus
Copy link
Contributor Author

Thireus commented Oct 31, 2025

@ikawrakow please let me know if you are happy with how I've addressed your comments:
e552942
and
6a24dec

@Thireus
Copy link
Contributor Author

Thireus commented Nov 1, 2025

Windows test builds: https://github.com/Thireus/ik_llama.cpp/releases look for the tr-qwen3-vl-merged tags.

@sayap
Copy link
Contributor

sayap commented Nov 1, 2025

Made an IQ5_KS quant and f32/f16/bf16 mmproj from the PR branch (using mainline convert_hf_to_gguf.py). Works fine so far. Thanks!

@MrHills-2
Copy link

With llama-server and openai API text works, but sending images returns error "Failed to parse messages: Unsupported content part type: \"image_url\"".

@Thireus
Copy link
Contributor Author

Thireus commented Nov 1, 2025

With llama-server and openai API text works, but sending images returns error "Failed to parse messages: Unsupported content part type: "image_url"".

It's a separate issue and has been mentioned by @ikawrakow - #792 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants